Econ 435-5: Quantitative methods in economics Summer 2007
Final exam
Friday August 10, 15.30 18.30
The exam is divided in three exercises. The whole exam is graded on 100 marks. The marks for each
questions is given in parenthesis. (I reserve the right to modify it slightly).
Reminder: If Z is distributed as a N(0,1), then
Pr[ | Z | 1.64] 0.9, Pr[ Z 1.28] 0.9,
Pr[ | Z | 1.96] 0.95, Pr[ Z 1.64] 0.95,
Pr[ | Z | 2.58] 0.99, Pr[ Z 2.33] 0.99.
Exercice 1 (41 marks)
Julie Cullen and Stephen Levitt investigated the connection between a city’s population and its
crime rate. To capture the adjustments made by city residents when crime rate change, they
focused on changes in population and crime rates. They estimated the following model
d1_pop = β0 + β1 c1_crim + β2 c1_scr + β3 c1_unem + u
where d1_pop: one-year change in log population
c1_crim: one year change in the city’s crime rate (number of crimes over population)
c1_scr: one year change in the rest of the urban areas crime rate
c1_unem: one year change in the local unemployment rate
They also had variables on the changes in the log of rate at which state sends people to prison
and releases people from prison in the form of
dz1_ccom, dz2-ccom and dz3_ccom: prior year, 2-year-prior, 3-year-prior changes in the
log of rate at which state sends people from prison
dz1_rele, dz2-rele and dz3_rele: prior year, 2-year-prior, 3-year-prior changes in the log of
rate at which state releases people to prison.
We here use a subsample of their data of around 900 observations on 127 cities.
1. (4) Explain what does the dependent variable, defined as the change in log population,
represent. What is the advantage of defining the dependent variables in this way?
2. (9) Using the results from OLS estimation in Table 1, interpret the coefficients and t-
stats of the explanatory variables.
3. (4) Explain why the two crime rates variables may be endogenous.
4. (8) Cullen and Levitt instrumented the two crime rates variables by the last 6 variables
dz1_ccom to dz3_rele. Explain why these variables can be a priori valid instruments
for the crime rates changes. Are the parameters of the crime rate variables exactly
identified, overidentified, or underindentified?
5. (6) How do IV estimation results compare to OLS results?
6. (6) The residual from IV estimation is saved as Resid01. Explain what is done in Table
2. What can you conclude from these results?
7. (4) Using the results in Table 3, what can you say about the instruments? What is your
overall conclusion?
Exercice 2 (36 marks)
Three researchers (F. Cai, M. Maurer-Fazio and X. Meng) have studied the determinants of being a
member of the communist party in China. We here use a subsample of 1018 working-age adults from
six Chinese cities. The variables are
Pmember: =1 if the individual is a communist party member
Beijing, Changchun, Nanjing, Tianjing, Wuhan, Xian: Dummy variables =1 if the individual
lives in the city.
Female: = 1 if the individual is female
Yrs_edu: Years of education of individual
Edu_mother: Years of education of individual’s mother
Edu_father: Years of education of individual’s father
1. (7) Explain the similarities and differences of probit compared to logit analysis (what
do we aim to explain, what is the model, estimation method, results, …)
2. (5) Why was the variable Tianjing omitted? How would the probit results change if
Nanjing was omitted instead?
3. (5) Explain how to compute the effect of living in Beijing on the probability of being a
party member, say in the probit model.
4. (10) From Table 4, what can you say on the effects of the education variables
(yrs_edu, edu_mother, edu_father) in each of the two models? Can you suggest some
explanations for relative differences between the effects of these variables?
5. (6) How do these effects compare between the two models?
6. (4) Do the logit and probit models yield qualitatively different conclusions? Explain.
Exercice 3 (22 marks)
A U.S. researcher studies the index of industrial production IPt. He has monthly data on IPt from
January 1934 to July 2007. He defines his dependent variable as Yt = 1200 log (IPt / IPt-1).
1. (5) What does Yt measure? Why does the researcher use Yt instead of IPt?
2. (4) An autoregressive model yields
Ŷt = 1.899 + 0.454 Yt-1 +0.003 Yt-2 + 0.061 Yt-3 + 0.002 Yt-4
(0.770) (0.092) (0.093) (0.099) (0.073)
T= 877 Adjusted R-squared = 0.22
Are the coefficients of the lags of Yt significant (no detailed computation)?
3. (4) The correlogram of the residuals is shown below. What can you say? What does this
imply for the calculation of standard errors?
4. (2) Worried about potential seasonal fluctuations, the forecaster adds Yt-12 to the
autoregression and obtains
Ŷt = 2.112 + 0.473 Yt-1 - 0.065 Yt-2 + 0.102 Yt-3 +-0.049 Yt-4 + 0.102 Yt-12
(0.706) (0.100) (0.700) (0.092) (0.069) (0.034)
T= 869 Adjusted R-squared = 0.26
Is the worry of the researcher justified?
5. (2) Is this model seem adapted to the researcher’s objective?
6. (5) The forecaster includes in his equation four lags of the change in the interest rate Rt on
three-month U.S. treasury bills. Explain how to test for “Granger causality” of Δ Rt on Yt
The test statistic is 3.15 with a p-value of 0.0138. What do you conclude?
Table 1: Estimation results for Exercise 1: T-stats in parentheses
OLS IV
C1_crim -1.065131 -2.062260
(7.177668) (-1.960989)
C1_scr 0.147476
0.775509 3.653310
(-7.222424) (1.071624)
C1_unem 0.469871 0.441981
(2.777723) (4.737956)
Constant 0.006462 0.006172
(5.958541) (3.596782)
R2 0.117
N 889 889
Table 2:
Dependent Variable: RESID01
Method: Least Squares
Included observations: 889 after adjustments
White Heteroskedasticity-Consistent Standard Errors & Covariance
Variable Coefficient Std. Error t-Statistic Prob.
C -0.001312 0.001213 -1.081161 0.2799
DZ1_CCOM -0.324250 0.457345 -0.708984 0.4785
DZ2_CCOM 0.357009 0.571135 0.625086 0.5321
DZ3_CCOM 0.528592 0.583589 0.905761 0.3653
DZ1_RELE 0.460050 0.507976 0.905652 0.3654
DZ2_RELE 0.384105 0.449965 0.853633 0.3935
DZ3_RELE 0.527382 0.463627 1.137515 0.2556
C1_UNEM -0.080667 0.097164 -0.830218 0.4066
R-squared 0.009339 Mean dependent var -3.70E-18
Adjusted R-squared 0.001468 S.D. dependent var 0.028200
Log likelihood 1915.562 F-statistic 1.186510
Durbin-Watson stat 1.990225 Prob(F-statistic) 0.307711
Table 3:
Dependent Variable: C1_CRIM
Method: Least Squares
Included observations: 948 after adjustments
White Heteroskedasticity-Consistent Standard Errors & Covariance
Variable Coefficient Std. Error t-Statistic Prob.
C 0.002617 0.000342 7.651031 0.0000
DZ1_CCOM -0.213433 0.127653 -1.671979 0.0949
DZ2_CCOM -0.436621 0.160717 -2.716710 0.0067
DZ3_CCOM 0.822804 0.141253 5.825033 0.0000
DZ1_RELE 0.199656 0.134220 1.487530 0.1372
DZ2_RELE 0.455046 0.137533 3.308641 0.0010
DZ3_RELE 0.494745 0.125010 3.957648 0.0001
C1_UNEM 0.023653 0.029509 0.801550 0.4230
R-squared 0.039366 Mean dependent var 0.001794
Adjusted R-squared 0.032213 S.D. dependent var 0.008264
S.E. of regression 0.008130 Akaike info criterion -6.778122
Sum squared resid 0.062130 Schwarz criterion -6.737157
Log likelihood 3220.830 F-statistic 5.502944
Durbin-Watson stat 1.795456 Prob(F-statistic) 0.000003
Table 4: Probit and Logit results
Probit results Logit results
Variable Coefficient z-Statistic Prob. Coefficient z-Statistic Prob.
C -3.711365 -6.737859 0.0000 -7.901319 -6.533589 0.0000
BEIJING 1.220846 4.666702 0.0000 2.687616 4.285626 0.0000
CHANGCHUN 0.651123 2.190568 0.0285 1.534682 2.195713 0.0281
NANJING 0.178145 0.553061 0.5802 0.507452 0.647166 0.5175
WUHAN 1.114837 4.217908 0.0000 2.502861 3.983577 0.0001
XIAN 0.548500 1.754342 0.0794 1.219102 1.686014 0.0918
FEMALE -0.002611 -0.021202 0.9831 0.033667 0.139872 0.8888
YRS_EDU 0.158489 3.356871 0.0008 0.369245 3.904545 0.0001
EDU_MOTHER -0.057654 -2.544526 0.0109 -0.120695 -2.765462 0.0057
EDU_FATHER 0.019273 0.949303 0.3425 0.036201 0.907404 0.3642
Econ 435-5: Quantitative methods in economics Summer 2006
Final exam
Wednesday 9 August 15.30 18.15
The exam is divided in four exercises. The whole exam is graded on 100 points. For each part you
are given the corresponding number of points (I reserve the right to modify it slightly).
Reminder: If Z is distributed as a N(0,1), then
Pr[ | Z | 1.64] 0.9, Pr[ Z 1.28] 0.9,
Pr[ | Z | 1.96] 0.95, Pr[ Z 1.64] 0.95,
Pr[ | Z | 2.58] 0.99, Pr[ Z 2.33] 0.99.
Exercice 1 (20 points)
A researcher has annual data 1973-2002 on aggregate consumption Ct and aggregate income Yt
for a certain country. He is interested in exploring the relationship between C t and Yt allowing
for short-run dynamics, and fits the following regressions:
(1) an ADL regression of Ct on Yt, Ct-1 and Yt-1 using ordinary least squares (OLS)
(2) an OLS regression of Ct on Yt
The table shows the regression results. Robust standard errors are in parentheses. RSS is the
residual sum of squares.
(1) (2)
Yt 0.23 0.72
(0.03) (0.07)
Ct-1 0.48
(0.12)
Yt-1 0.19
(0.04)
constant -15.21 130.02
(18.27) (42.15)
RSS 2332.0 4195.2
1. The researcher has assumed that the pair (Ct , Yt ) is stationary. Explain what it means.
2. Explain in words what robust (Newey-West) standard errors are (use the example).
3. Show that specification (2) is a simplification of specification (1). How would you test
whether it is an acceptable simplification?
4. Construct 95% confidence intervals for the dynamic effects (multipliers) of income on
consumption in specification (1). How do you interpret them?
Exercice 2 (28 points)
A researcher has the following data for a random sample of 1,498 females drawn from the United
States National Longitudinal Survey of Youth: weight in kilos, height in centimeters, years of
schooling, age, marital status in the form of a dummy variable MARRIED defined to be 1 if the
respondent was married, 0 if single, and ethnicity in the form of a dummy variable BLACK defined to
be 1 if the respondent was black, 0 otherwise. These data were obtained for 1985 and 2000 for the
same women. The respondents were aged between 20 and 27 in 1985. Women who were divorced in
either 1985 or 2000 were excluded from the sample. The researcher fits two regressions:
(1) an ordinary least squares (OLS) regression combining the observations for 1985 and the
observations for 2000 with weight as the dependent variable and years of schooling,
MARRIED, height, age, and BLACK as explanatory variables
(2) a first differences (FD) regression with the change in weight from 1985 to 2000 as the
dependent variable and the change in years of schooling, the change in MARRIED, and the
change in age (15 years for all respondents) over the same period as explanatory variables.
The FD regression was estimated without a constant.
The results of these regressions are shown in the table with t-statistics given in parentheses above
each estimates coefficient.
1. Explain theoretically why OLS and FD regressions may yield different estimates of the
parameters of the model.
2. Compare the results for the coefficients of schooling and MARRIED in the OLS regression
and the FD regression. Give a possible intuitive explanation of the difference in results.
3. Explain why height and BLACK are excluded from the FD regression.
4. The change in age from 1985 to 2000 is the same for all respondents. Discuss the
implications, if any, for the FD regression.
5. R2 is much higher for the FD regression than for the OLS regression. Does this imply that the
FD regression is a better specification?
6. When the number of individuals is small, explain precisely how one can test whether
individual-specific fixed effects are jointly significant. (Here the number of individuals is
large, so the test is not practical)
OLS FD
Years of schooling -0.88 -0.06
(-7.41) (-0.25)
Married -3.27 0.01
(-5.28) (0.02)
Height (cm) 0.37
(11.51)
Age 0.82 0.72
(22.06) (28.26)
Black 6.12
(7.43)
constant -5.52
(-1.03)
R2 0.20 0.49
N 2,996 1,498
Exercice 3 (22 points)
A researcher interested in the relationship between parenting, age and schooling has data for the year
2000 for a random sample of 1,167 married males and 870 married females aged 35 to 42. In
particular, she is interested in how the presence of young children in the household is related to the
age and education of the respondent. She defines CHILDL6 to be 1 if there is a child less than 6
years old in the household and 0 otherwise and regresses it on AGE, which represents age, and S, the
years of schooling, for males and females separately using probit analysis. Defining the probability
of having a child less than 6 in the household to be p = Φ(Z) and the index Z as
Z = β0 + β1 AGE + β2 S
she obtains the results shown in the table (standard errors in parentheses).
males females
AGE -0.137 -0.154
(0.018) (0.023)
S 0.132 0.094
(0.015) (0.020)
constant 0.194 0.547
(0.358) (0.492)
Zm -0.399 -0.874
f(Zm) 0.368 0.272
For males and females separately, she calculates the index corresponding to the sample mean
values of AGE and S using the estimated probit coefficients. This index is denoted by Zm.
She further calculates f(Zm), where f(.) is the derivative of Φ(.). The values of Zm and f(Zm)
are shown in the table.
1. Describe with a diagram, the shape of the probability function Φ(Z) and explain why it
has that shape.
2. Explain without technical details the differences of probit analysis compared to OLS
(estimation method, results, interpretation, …)
3. Explain how to derive the marginal effects of the explanatory variables on the probability
of having a child less than 6 in the household. Calculate for both males and females the
marginal effects at the sample means of AGE and S. Explain whether the signs of the
marginal effects are plausible.
4. At a seminar someone asks the researcher whether the marginal effect of S is different
for males and females. The researcher does not know how to test whether the difference
is significant and asks you for advice. What would you advice her to do?
Exercice 4 (30 points)
In year t, aggregate demand for a certain commodity, QDt, is related to its price, Pt , and aggregate
income, Yt , as
QDt = β0 + β1 Pt + β2 Yt + UDt
Aggregate supply in year t, QSt is a simple function of Pt
QSt = α0 + α1 Pt + USt
UDt and USt are error terms that are distributed identically over time and independently of each other.
The market clears in each year, so that QDt = QSt .
To compare the properties of ordinary least squares (OLS) and instrumental variables (IV)
estimators in such a model, a researcher performed a Monte Carlo experiment with the following
equations:
QDt = 10 - 0.2 Pt + 0.05 Yt + UDt
QSt = 5 + 0.1 Pt + USt
The sample size was 30. Yt was 1,000 in the first observation, 1,050 in the second, rising in steps of
50 to 2,450. The variance of Yt was 187,292. UDt and USt were generated as random numbers from
normal distributions with mean 0 and variances 400 and 100, respectively. The researcher fitted ten
times the supply equation, first using OLS, and then using IV, with Yt acting as an instrument for Pt.
The results are tabulated in rows 1-10 of the table. Then, increasing the sample size to 30,000, but
keeping the same data for Y (repeating the series 1,000-2,450 one thousand times), she fitted the
model for a single sample, with the results shown in the last row of the table.
For the purposes of this exercise, any specific problems associated with time series can be
ignored.
1. Explain why OLS would yield inconsistent estimates if used to estimate the supply equation.
2. Explain under which conditions Yt can serve as an instrument for Pt to estimate the supply
equation. Are these conditions satisfied?
3. Show that the IV estimator of the slope coefficient α1 of the supply equation is consistent.
4. Discuss the estimates of α1 in the table, explaining whether they support or contradict your
answers to the previous questions.
5. Suppose now that supply in year t is governed by price in year t-1 and decisions made in
year t-1, so that
QSt = α0 + α1 Pt-1 + USt
How would this affect the estimation of the supply equation?
OL8 IV
n=30 α1 s.e.( α1) R2 α1 s.e.( α1) R2
1 0.073 0.017 0.383 0.088 0.023 0.367
2 0.064 0.022 0.233 0.102 0.032 0.153
3 0.076 0.020 0.346 0.098 0.025 0.319
4 0.078 0.013 0.552 0.097 0.017 0.520
5 0.067 0.0l5 0.406 0.118 0.035 0.176
6 0.091 0.023 0.363 0.097 0.035 0.362
7 0.071 0.023 0.252 0.090 0.034 0.235
8 0.078 0.017 0.431 0.094 0.021 0.414
9 0.081 0.016 0.484 0.110 0.023 0.423
10 0.067 0.016 0.372 0.104 0.024 0.255
n=30,000 α1 s.e.( α1) RZ α1 s.e.( α1) RZ
0.0695 0.0005 0.367 0.1002 0.0008 0.295